The data included in this analysis was assembled by earth scientist Zander Venter. Venter aquired all data through publicly available remote sensing datasets provided by the Google Earth Engine. Averages were calculated for each country “at a reduction scale of about 10 km”, and include measurements such as temperature, rainfall, elevation, and canopy cover.
# Read in data:
world_env_vars <- read_csv(here("data", "world_env_vars.csv")) %>%
clean_names()
# Create PCA dataset:
world_env_pca <- world_env_vars %>%
select_if(is.numeric) %>% # select only numeric values
select(-c(ends_with("_quart"))) %>% # remove any column ending in '_quart'
select(-c(ends_with("_month"))) %>% # remove any column ending in '_month'
select(-c("accessibility_to_cities", "aspect")) %>% # omit additional 2 columns
drop_na() %>% #drop NAs
scale() %>% # scale data with different orders of magnitude
prcomp()
world_env_pca$rotation
## PC1 PC2 PC3 PC4
## elevation 0.1310711077 0.04583268 -0.659947716 0.004699318
## slope 0.0003828036 0.23567957 -0.557324797 0.269104139
## cropland_cover 0.1465631407 0.25761056 0.300660652 -0.411449208
## tree_canopy_cover -0.3503012176 0.21686452 -0.097420615 -0.114166226
## isothermality -0.3746179043 -0.24622326 -0.091582555 0.029076552
## rain_mean_annual -0.4008209565 0.13952922 -0.085473543 -0.059303682
## rain_seasonailty 0.0621198382 -0.47092794 -0.115192828 -0.004082272
## temp_annual_range 0.4129875297 0.03262162 -0.067977001 -0.188143053
## temp_diurnal_range 0.1899487615 -0.44263819 -0.178920279 -0.200944831
## temp_mean_annual -0.2662247385 -0.41834393 0.115315504 -0.047042528
## temp_seasonality 0.3919145761 0.18740130 -0.006121805 -0.112541168
## wind 0.1466166122 0.05606025 0.273832445 0.795355755
## cloudiness -0.2848952864 0.34180006 -0.007437192 -0.132708475
## PC5 PC6 PC7 PC8 PC9
## elevation 0.17304988 -0.39497332 -0.11825278 0.10027078 -0.458545918
## slope 0.31658777 0.46089248 0.18127610 -0.02301636 0.381741531
## cropland_cover 0.68018096 0.06402613 -0.01314082 0.41315499 -0.056064033
## tree_canopy_cover -0.37325679 0.22839624 0.03645149 0.46521740 -0.447028020
## isothermality 0.07435174 -0.28873837 0.13563883 0.15836826 0.175963122
## rain_mean_annual -0.13469390 0.21697761 -0.09541207 0.33095187 0.257088331
## rain_seasonailty 0.12036062 0.35836014 -0.74072109 0.06431418 -0.078076512
## temp_annual_range -0.32085131 0.07245109 -0.06380262 0.16957173 0.203646619
## temp_diurnal_range -0.08563892 -0.25117385 0.19691623 0.46714546 0.368130439
## temp_mean_annual 0.09075995 0.23150309 0.11578585 -0.04035345 -0.052718639
## temp_seasonality -0.32975952 0.17388444 -0.06745553 0.03839569 0.071125581
## wind 0.03304784 -0.13113962 -0.13688404 0.45267744 0.006109216
## cloudiness -0.03676129 -0.38350274 -0.54400749 -0.09244911 0.391166274
## PC10 PC11 PC12 PC13
## elevation -0.34990735 -0.04542848 0.01946759 0.006023194
## slope 0.17678677 -0.19239681 -0.03625869 -0.008670464
## cropland_cover -0.01100688 -0.05803955 0.09060058 -0.003489179
## tree_canopy_cover 0.36781088 -0.23020098 -0.10054564 -0.016489872
## isothermality 0.18393542 -0.07976703 0.76316733 -0.061999094
## rain_mean_annual -0.55641773 0.49870008 0.01276623 0.025245322
## rain_seasonailty 0.19616973 0.10925026 0.09110318 0.023565231
## temp_annual_range -0.13372765 -0.17532063 0.15470482 -0.730282444
## temp_diurnal_range 0.15493793 0.01601380 -0.38228732 0.249500666
## temp_mean_annual -0.49520055 -0.63226837 -0.11818559 -0.001518365
## temp_seasonality -0.17402912 -0.22823763 0.41121018 0.630117369
## wind -0.09618194 -0.12631299 -0.02211008 -0.010481027
## cloudiness 0.04837631 -0.37249703 -0.18834439 0.042951517
world_env_complete <- world_env_vars %>%
drop_na()
# Generate PCA plot:
# Assumptions: linear relationships between variables, continuous measured variables, suffifient sample size
biplot <- autoplot(world_env_pca,
data = world_env_complete,
colour = "country",
loadings = TRUE,
loadings.colour = "gray60", # changes colors of the arrows
loadings.label = TRUE, # displays variables
loadings.label.colour = "black", # change font color
loadings.label.vjust = 1) +
#geom_text(aes(label = country), col = "gray50", size = 2) + # use this to add country names
theme_minimal() +
theme(legend.position = "none") # remove legend
ggplotly(biplot)
Figure 1: This PCA biplot was created using observations of 188 countries that were not missing data for included components. The loading variables for the first two principal components are shown with gray arrows and labeled accordingly. The location of each country shows their overall location in multivariate space. The length of a loading variable line indicates variance; the shorter the line, the smaller the variance in that principal component’s direction. The angle between each line indicates correlation between loading variables. Plot is interactive, individual country names can be viewed by hovering over specific points. (Data courtesy of Zander Venter).